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Abstract 

Background: There are several reports describing thousands of SSR markers in the peanut {Arachis hypogaea L) 
genome. There is a need to integrate various research reports of peanut DNA polymorphism into a single platform. 
Further, because of lack of uniformity in the labeling of these markers across the publications, there is some 
confusion on the identities of many markers. We describe below an effort to develop a central comprehensive 
database of polymorphic SSR markers in peanut. 

Findings: We compiled 1,343 SSR markers as detecting polymorphism (14.5%) within a total of 9,274 markers. 
Amongst all polymorphic SSRs examined, we found that AG motif (36.5%) was the most abundant followed by 
AAG (12.1%), AAT (10.9%), and AT (10.3%).The mean length of SSR repeats in dinucleotide SSRs was significantly 
longer than that in trinucleotide SSRs. Dinucleotide SSRs showed higher polymorphism frequency for genomic SSRs 
when compared to trinucleotide SSRs, while for EST-SSRs, the frequency of polymorphic SSRs was higher in 
trinucleotide SSRs than in dinucleotide SSRs. The correlation of the length of SSR and the frequency of 
polymorphism revealed that the frequency of polymorphism was decreased as motif repeat number increased. 

Conclusions: The assembled polymorphic SSRs would enhance the density of the existing genetic maps of peanut, 
which could also be a useful source of DNA markers suitable for high-throughput QTL mapping and 
marker-assisted selection in peanut improvement and thus would be of value to breeders. 
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Findings 

Background 

Cultivated peanut (Arachis hypogaea L.) is among the 
most important legume crops and a valuable source of 
oil and protein. Grown on six continents, it is econom- 
ically the second most important legume in the U.S. 
Peanuts are planted annually on about 22 million ha 
worldwide, with a production of 35 million tons (source: 
http://www.agrostats.com/world-statistic/world-peanut. 
html). 

Peanut is a self-pollinated allotetraploid (2n = 4x = 40) 
crop with a large genome (2.8 Gbp). Unlike many other 
polyploid crop species, cultivated peanut is generally 
believed to be monophyletic in origin [1]. Thus, peanut 
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germplasm exhibits far less molecular genetic variation 
than most other cultivated crops resulting in the detec- 
tion of fewer DNA markers in this crop. Consequently, 
marker-assisted selection, an important tool now in the 
improvement of many crops, is yet to play a significant 
role in peanut breeding. Paucity of DNA markers has also 
resulted in inadequate understanding of the nature and 
evolution of the peanut genome. 

During the past two decades, much effort has been 
made to develop genetic and genomic tools in cultivated 
peanut, such as construction of BAC libraries [2,3], 
cDNA libraries [4-7], genetic linkage maps [8-18], and 
development of DNA markers [19-36]. Among various 
molecular markers investigated so far, simple sequence 
repeats (SSR) have emerged as the preferred DNA mar- 
ker system for conducting genetic and genomic studies 
in cultivated peanut [10,11,18,23,26-28,32,33]. To date, 
nearly 10,000 SSRs have been identified by various 
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research groups around the world. Initial development 
of SSR markers in peanut employed DNA fragments 
containing SSRs enriched from genomic libraries by 
using various SSR probes. Currently SSRs are increas- 
ingly developed through data mining of EST and BAC- 
end sequences. While there are 32 publications on pea- 
nut DNA markers so far, there is a need to analyze all 
existing SSR markers in peanut to develop a central 
database of polymorphic SSRs with unambiguous labels 
gleaned from published literature and the public gen- 
ome database. Such a comprehensive review of poly- 
morphic SSRs would help to advance peanut research 
and improvement as it would provide an overall snap- 
shot of all existing DNA markers as well as those that 
are polymorphic. Further, there is considerable interest 
among peanut breeders to introduce useful genes from 
wild species to improve genetic diversity using marker- 
assisted selection using polymorphic markers. 



Methods 

Information on publicly available peanut SSRs was col- 
lected by scanning scientific publications. Based on se- 
quence similarity search with legacy Arachis SSR primer 
sequences, redundant primer sequences were detected by 
BLAST with an E-value cut off of le" 20 DNA sequences 
containing polymorphic SSRs were re-searched for motif 
and repeat number using SSRIT software. Polymorphic 
SSRs as well as their polymorphism information content 
(PIC) values were collected from original and cited publi- 
cations, or determined by laboratory testing for poly- 
morphism using a panel of cultivated peanut genotypes by 
the authors. These eight cultivated varieties viz., Tifrunner, 
GT-C20, SunOleic 97R, NC94022, Yue you 92, Xin Hui 
Xiao Li, D99, and H22 are also parental genotypes of four 
mapping populations. Genomic DNAs were extracted from 
these genotypes using MasterPure Plant Leaf DNA Purifi- 
cation Kit (Epicentre, Madison, WI). The PCR program 



Table 1 List of total publicly available and polymorphic SSR markers in peanut 



Marker name 
(prefix) 


EST or genomic 
SSR 


Total no of 
SSRs developed 


No of polymorphic 
SSR 


No of mapped 
SSR 


Publication 


Ah4-xx, Ah6-xx 


Genomic 


26 


6 


4 


[23] 


Apxx 


Genomic 


7 


2 


1 


[25] 


PMxx 


Genomic 


275 


43 


33 


[26,27] 
[35] 


PMxx 


EST 


44 


5 


4 


[27] 


pPGPseqxx, pPGSseqxx 


Genomic 


226 


140 


93 


[28] 


Ah-xx 


Genomic 


67 


12 


7 


[29] 


AM xx, Ah2xx, gi-xx 


Genomic 


121 


84 


75 


[10] 


AS1 RNxx, AS1R1xx, 
ASIMLxx, gi-xx 


EST 


112 


20 


12 


[10] 


GAxx 


Genomic 


103 


46 


35 


[30] 


AS 1 RNxx, AS1 RMxx 


EST 


107 


14 


7 


[5] 


Ahxx 


Genomic 


13 


9 


7 


[37] 


S-xx 


Genomic 


123 


45 


4 


[31] 


EM-xx, EE-xx 


EST 


290 


29 


9 


[6] 


IPAHMxx 


Genomic 


170 


54 


36 


[32] 


GMxx 


EST 


2,138 


156 


133 


[34] 


AHMxx 


Genomic 


2 


2 


0 


[38] 


AHBGSxx 


EST 


35 


2 


0 


[11] 


ICGMxx 


Genomic 


23 


8 


0 


[33] 


Fxx, PDxx 


EST 


33 


4 


0 


[39] 


AHSxx 


EST 


3,187 


373 


9 


[7] 
[18] 


GNBxx 


Genomic 


1,152 


167 


79 


[18] 


Adxx, Aixx 


Genomic 


167 


13 


10 


[18] 


AHGSxx 


Genomic 


706 


23 


17 


[18] 


Ah3xx 


Genomic 


147 


86 


18 


[36] 


Total 




9,274 


1,343(14.5%) 


593(6.4%) 
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was subject to 94°C/3 min for initial denaturation, followed 
by 35 cycles of 94°C/30 see, 55°C/30 sec, and 72°C/30 sec, 
and 72°C/5 min for final extension. PCR products were 
resolved in polyacrylamide gel in LI-COR 4300 DNA 
Analyzer (LI-COR, Lincoln, NA). All polymorphic SSRs 
were listed in the Microsoft Excel file as a reference and 
GenBank accession numbers were included wherever avail- 
able in order to track their original flanking sequences by 
hyperlink. SSRs mapped in published genetic linkage maps 
were highlighted by authors' name. Resources of species 
and DNA domains from which SSRs were identified were 
also shown to indicate genomic and EST-SSRs, or culti- 
vated and wild species SSRs. 

Findings 

Redundancy of SSRs developed from different research 
groups along with the use of non-uniform marker names 
have resulted in duplicate genotyping of peanut germ- 
plasm and inefficient use of resources in peanut genom- 
ics. Therefore, there is a need for central depository of 
informative SSR markers for peanut including all pub- 
lished markers but without redundancy by employing 
unique and unambiguous marker names. We have 
attempted to develop such a set of polymorphic SSR 
markers in peanut. 

The total number of SSRs reported to date in both 
cultivated and wild peanut species from the published 
literature was 9,274 (Table 1). From these, we identified 
1,343 SSR markers (14.5% of the total) that detected 
variation within peanut germplasm. We further analyzed 
these polymorphic SSRs to gain insights into their na- 
ture and frequency. All published SSRs were summar- 
ized in Table 1, which shows the source, name, and 
numbers of developed, polymorphic and mapped SSRs. 
The length of most sequences was ranged from 100 to 
500 bp. Assuming the average length of SSR containing 
sequences is 250 bp, these SSRs would contain 2.3 Mbp 
which corresponds to 0.083% of the peanut genome 
(2,800 Mbp). Among these SSRs, 5,946 were EST-SSRs 
and 3,328 were genomic SSRs, from which 603 and 740 
were confirmed to be polymorphic at frequencies of 
10.1% and 22.2% from EST and genomic sequences, 
respectively. 

Additional file 1 provides descriptive information on 
the polymorphic SSR markers. This file contains other 
informations, such as, marker name, primer name, alter- 
native name, and GenBank accession numbers where 
they were available. These polymorphic SSRs were iden- 
tified by various research groups around the world and 
often employed different names to denote the same SSR 
marker. In some instances, two different markers have 
very similar names, adding to the confusion; for example 
Ah-xx developed by [29] and Ahxx by [37], sound simi- 
lar but are from different citations. 



Some markers having unique names such as marker 
IPAHMxx and XIPxx, are in fact the same markers but 
can be easily mistaken as different markers. Further, 
some marker names and their primer names are often 
referred to as if they are different markers, such as mar- 
ker name AhlTC3A12 with primer name TC3A12, both 
of which could be mapped on the same genetic linkage 
map. In the Additional file 1, we present a list of such 
redundant markers in effort to eliminate duplicate nam- 
ing of markers. All polymorphic SSR markers listed in 
the Additional file 1 provide clear information of their 
source, origin and nature. We believe that such a snap- 
shot of information on all the available polymorphic 
SSRs in peanut will serve as a useful resource for high- 
throughput genotyping by array-based platforms in QTL 
mapping and marker-assisted selection in peanut 
breeding. 

Among 1,343 polymorphic SSR markers, dinucleotide 
and trinucleotide motifs were the most predominant and 
a few were the others. The predominant 1,508 di- and 
tri-numcleotide motifs were identified and sorted as 
EST-SSRs or genomic SSRs (Table 2). EST sequences 
harbored 597 SSR motifs in which motifs AAG (21.1%) 
and AG (20.9%) were most abundant. Genomic SSRs 
had 911 motifs where motif AG was the most abundant 

Table 2 Distribution of various types of motifs in 



polymorphic EST-SSRs and genomic-SSRs 



Motif 


EST-SSR (%) 


Genomic-SSR (%) 


Total (%) 


AT/TA 


31 (5.2) 


124 (13.6) 


155 (10.3) 


AG/GA/CT/TC 


125 (20.9) 


425 (46.7) 


550 (36.5) 


AC/CA^G/GT 


18 (3.0) 


112 (12.3) 


130 (8.6) 


GC/CG 


1 (0.2) 


0 (0.0) 


1 (0.1) 


AAG/AGA/GAA/CTT/ 


126 (21.1) 


57 (6.3) 


183 (12.1) 


AAT/ATA/TAA/ATT/ 
TTA/TAT 


55 (9.2) 


109 (12.0) 


164 (10.9) 


ATG^GA/GAT/CAT/ 
ATC^CA 


47 (7.9) 


18 (2.0) 


65 (4.3) 


AAC/ACA/CAA/GTT/ 
TTGfTGT 


46 (7.7) 


34 (3.7) 


80 (5.3) 


ACC/CCA/CAC/GGT/ 
GTG^GG 


32 (5.4) 


10(1.1) 


42 (2.8) 


AGG/GGA/GAG/CCT/ 
CTC^CC 


35 (5.9) 


4 (0.4) 


39 (2.6) 


AGT/GTA^AG/ACT/ 
CTA^AC 


9(1.5) 


5 (0.5) 


14(0.9) 


AGC/GCA/CAG/GCT/ 
CTG^GC 


30 (5.0) 


4 (0.4) 


34 (2.3) 


ACG/CGA/GAC/CGT/ 
GTC^CG 


14(2.3) 


2 (0.2) 


16(1.1) 


GGC/GCG/CGG/GCC/ 
CCG/CGC 


28 (4.7) 


7 (0.8) 


35 (2.3) 


Total 


597 


911 


1,508 
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(comprising 467%) followed by motifs AT (13.6%), AC 
(12.3%), and AAT (12.0%). The detection of such higher 
percentage of motif AG in genomic sequences might be 
because of the bias stemming from the use of dinucleo- 
tide SSRs as probes, such as (AG)n in the enrichment 
approach for identification of SSRs in the peanut gen- 
ome [26]. Interestingly, with one exception no EST-SSR 
or genomic SSR with motif CG was detected poly- 
morphic. A similar result was reported by Moretzsohn 
et al. 2005 [10]. In total, motif AG (43.9%) was the most 
polymorphic and frequent SSR marker type derived from 
both EST-SSRs and genomic SSRs, followed by AT 
(12.4%), AAT (11.1%), AC (10.6%), and AAG (9.1%). 
These are also the motifs that are generally most abun- 
dant in the peanut genome [18,28], while in the soybean 
genome, motifs AT, AAT, and AAAT were the most 
abundant after searching whole genome sequences [39] . 

Comparison of the length of SSRs revealed that the 
mean length of dinucleotide SSRs was significantly longer 
than those in trinucleotide SSRs for EST-SSRs and gen- 
omic SSRs, respectively (t = 12.48 and t = 8.79, p < 0.0001) 
(Table 3). This finding was consistent with observation in 
barley [40], sugarcane [41] and soybean [42]. As the fre- 
quency of polymorphism was compared, SSRs derived 
from genomic sequences was significantly higher than 
EST-SSRs in dinucleotide SSRs, but was lower than in tri- 
nucleotide SSRs using Fishers exact test (P < 0.0001). 

Many studies have reported that SSRs with longer re- 
peat length are more polymorphic in plant species 
[10,18,43,44]. In this study, longer mean length of SSR 
repeat was found in dinucleotide SSRs, but they exhib- 
ited higher polymorphism frequencies as trinucleotide 
SSRs in EST-SSRs. This may be due to changes of di- 
nucleotide repeat length in exons that are likely to be 
suppressed due to the deleterious nature of the frame- 
shift mutation that would frequently result in translated 
regions [42,45]. Expansion or contraction of SSR repeat 
length can occur because of replication slippage which is 
considered as one of the main reasons for SSR muta- 
tions. SSR instability is also dependent on motif size, nu- 
cleotide content and SSR length [46]. 

The relationship between the length of an SSR and the 
frequency of polymorphism was also analyzed by compar- 
ing the repeat number of SSRs and the number of 



Table 3 Comparison of motif number and mean of repeat 
number between dinucleotide and trinucleotide SSRs 



Motif 




EST-SSR 




Genomic SSR 




No of 
motifs 


Repeat 
number 
Mean ± SD 


No of 
motifs 


Repeat 
number 
Mean ± SD 


Dinucleotide 


175 


11.35 ±5.69 


661 


16.50 ±8.88 


Trinucleotide 


422 


6.94 ± 2.79 


250 


1 1 .05 ± 6.95 



polymorphic SSRs (Figure 1). In general, as the repeat 
number increased, the number of polymorphic SSRs 
decreased for both dinucleotide and trinucleotide SSRs. 
The correlation coefficient of the number of polymorphic 
SSRs with the number of repeat was -0.945 and -0.661 in 
dinucleotide and trinucleotide SSRs, respectively. In di- 
nucleotide SSRs, repeat number from 5 to 23 (the length 
between 10 to 46 bp) displayed higher frequencies of poly- 
morphism, i.e. more than 25 polymorphic SSRs for each 
repeat number within above range. At higher repeat num- 
bers, the frequency dropped to less than 20 polymorphic 
SSRs. However, in trinucleotide SSRs, repeat number be- 
tween 4 to 9 (12-27 bp) exhibited more than 40 poly- 
morphic SSRs. At repeat number of more than 10 (30 bp), 
the number of polymorphic SSRs steeply dropped to less 
than 25. The peak of frequency of polymorphism for the 
respective repeat numbers did not always follow the same 
pattern. For instance, the higher frequency of poly- 
morphism only occurred for repeat number between 4 
to 7 (12-21 bp) for motif AAG, while the highest fre- 
quency of polymorphism for motif AAT occurred for 
repeat number 5-6 (15-18 bp), and 14-20 (42-60 bp) 
in trinucleotide SSRs. Nevertheless, the distribution of 
polymorphic SSRs among the different repeat numbers 
was generally skewed to the smaller number of repeats. 
This might be simply because SSRs with fewer repeats 
have been identified in high frequencies than those with 
larger repeat numbers. In common bean, similar result 
was reported that the number of SSRs was reduced as 
the repeat number increased in Blair et al. [47]. 

Temnykh et al. [48] provide a threshold number for 
short and long SSRs, where the length of SSR greater 
than 20 bp is considered as long SSR, named "class I"; 
while those less than 20 bp are considered short SSR, 
named "class II". Using this criterion, we found 534 SSRs 
as longer than 20 bp (class I) while 302 SSRs in the short 
length range (class II) in dinucleotide SSRs. From this 
point, longer SSRs are more polymorphic than short 
SSRs although the length of SSR is highly negative corre- 
lated with the frequency of polymorphism (correlation 
coefficient of -0.945). However, in trinucleotide SSRs, 
the number of long SSRs (333) was similar to short SSRs 
(339). When considering both dinucleotide and trinu- 
cleotide SSRs together, the longer SSRs (867) is indeed 
greater than short SSRs (635), which is consistent with 
many previous reports [10,18,43,44]. 

Increasing availability, affordability and accessibility of 
molecular markers are facilitating the development of 
genetic linkage maps in all major crops. Although the 
first peanut genetic linkage map was reported by [8] 
using RFLP markers in a wild species x wild species 
population, no genetic linkage map was developed for 
cultivated x cultivated peanut until 15 years later when 
considerable numbers of SSR markers were available. 
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AT/TA 

AG/GA/CT/TC 
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- AG G/G GA/GAG/CCT/CTC/TCC 

AG C/G CA/CAG/GCT/CTG/TGC 
ACG/CGA/GAC/CGT/GTC/TCG 
G G C/G CG/CG G/G CC/CCG/CGC 
A G T/G TA/TAG/ A CT/ CTA/TA C 
TOTAL 



4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 >32 

Number of repeat 

Figure 1 a Relationship between number of repeat and number of polymorphic SSR motif in dinucleotide SSRs. b Relationship between 
number of repeat and number of polymorphic SSR motif in trinucleotide SSRs. 



While SSRs have become increasingly important tools 
for molecular genetic analysis, another potentially useful 
and widely used marker, Single Nucleotide Polymorph- 
ism (SNP), has not been developed yet in peanut. 

To date, seven genetic linkage maps have been pub- 
lished for cultivated x cultivated populations using SSR 
markers [12-14,16-18,49]. Among the 1,343 polymorphic 
SSRs that we assembled, 593 were mapped in these 
seven maps (Table 1; Additional file 1). When these 
maps were constructed, the total available polymorphic 
SSR markers numbered about six hundred. Therefore, 
the range of mapped SSR loci in these genetic maps was 
only from 131 to 324, and these maps still need to be 
saturated by adding more markers for further molecular 
research, such as QTL mapping, map-based cloning, and 
marker-assisted selection in peanut breeding. With a total 
of 1,343 polymorphic markers available now, including 



recently generated BAC-end sequence SSRs [18], EST- 
SSRs [7], and genomic SSRs [36], we presume that con- 
struction of a higher density genetic linkage map with 
-500 SSR loci in the cultivated peanut is feasible. 

Molecular markers are frequently polymorphic in one 
population, but monomorphic in another. Among the 
seven genetic linkage maps in cultivated peanut, two maps 
were constructed using mapping populations from China, 
three from India, and two from the USA. Some of these 
informative SSR markers detected polymorphism only in 
one of three regional populations, but not others, indicat- 
ing that there is genetic variation between regional popu- 
lations presumably due to differences in their lineages. 
However, there were still 45 SSR markers which consist- 
ently detected polymorphism across all regional popula- 
tions of peanuts from China, India and USA. These SSR 
markers thus may represent the most variable markers so 



Zhao et al. BMC Research Notes 2012, 5:362 
http://www.biomedcentral.eom/1756-0500/5/362 



Page 6 of 7 



far detected within the peanut genome and corresponding 
to frequent mutant loci in this crop. 
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